Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Free, publicly-accessible full text available May 1, 2026
- 
            Deep neural networks trained using gradient descent with a fixed learning rate eta often operate in the regime of ``edge of stability'' (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold 2/eta. In this work, we present a fine-grained analysis of the learning dynamics of (deep) linear networks (DLNs) within the deep matrix factorization loss beyond EOS. For DLNs, loss oscillations beyond EOS follow a period-doubling route to chaos. We theoretically analyze the regime of the 2-period orbit and show that the loss oscillations occur within a small subspace, with the dimension of the subspace precisely characterized by the learning rate. The crux of our analysis lies in showing that the symmetry-induced conservation law for gradient flow, defined as the balancing gap among the singular values across layers, breaks at EOS and decays monotonically to zero. Overall, our results contribute to explaining two key phenomena in deep networks: (i) shallow models and simple tasks do not always exhibit EOS; and (ii) oscillations occur within top features}. We present experiments to support our theory, along with examples demonstrating how these phenomena occur in nonlinear networks and how they differ from those which have benign landscapes such as in DLNs.more » « lessFree, publicly-accessible full text available March 25, 2026
- 
            Free, publicly-accessible full text available March 5, 2026
- 
            Large-scale foundation models have demonstrated exceptional performance in language and vision tasks. However, the numerous dense matrix-vector operations involved in these large networks pose significant computational challenges during inference. To address these challenges, we introduce the Block-Level Adaptive STructured (BLAST) matrix, designed to learn and leverage efficient structures prevalent in the weight matrices of linear layers within deep learning models. Compared to existing structured matrices, the BLAST matrix offers substantial flexibility, as it can represent various types of structures that are either learned from data or computed from pre-existing weight matrices. We demonstrate the efficiency of using the BLAST matrix for compressing both language and vision tasks, showing that (i) for medium-sized models such as ViT and GPT-2, training with BLAST weights boosts performance while reducing complexity by 70% and 40%, respectively; and (ii) for large foundation models such as Llama-7B and DiT-XL, the BLAST matrix achieves a 2x compression while exhibiting the lowest performance degradation among all tested structured matrices. Our code is available at https://github.com/changwoolee/BLAST.more » « lessFree, publicly-accessible full text available December 6, 2025
- 
            Free, publicly-accessible full text available December 5, 2025
- 
            In this work, we present a novel approach for compressing overparameterized models, developed through studying their learning dynamics. We observe that for many deep models, updates to the weight matrices occur within a low-dimensional invariant subspace. For deep linear models, we demonstrate that their principal components are fitted incrementally within a small subspace, and use these insights to propose a compression algorithm for deep linear networks that involve decreasing the width of their intermediate layers. We empirically evaluate the effectiveness of our compression technique on matrix recovery problems. Remarkably, by using an initialization that exploits the structure of the problem, we observe that our compressed network converges faster than the original network, consistently yielding smaller recovery errors. We substantiate this observation by developing a theory focused on deep matrix factorization. Finally, we empirically demonstrate how our compressed model has the potential to improve the utility of deep nonlinear models. Overall, our algorithm improves the training efficiency by more than 2x, without compromising generalization.more » « less
- 
            Latent diffusion models have been demonstrated to generate high-quality images, while offering efficiency in model training compared to diffusion models operating in the pixel space. However, incorporating latent diffusion models to solve inverse problems remains a challenging problem due to the nonlinearity of the encoder and decoder. To address these issues, we propose ReSample, an algorithm that can solve general inverse problems with pre-trained latent diffusion models. Our algorithm incorporates data consistency by solving an optimization problem during the reverse sampling process, a concept that we term as hard data consistency. Upon solving this optimization problem, we propose a novel resampling scheme to map the measurement-consistent sample back onto the noisy data manifold and theoretically demonstrate its benefits. Lastly, we apply our algorithm to solve a wide range of linear and nonlinear inverse problems in both natural and medical images, demonstrating that our approach outperforms existing state-of-the-art approaches, including those based on pixel-space diffusion models.more » « less
- 
            We study the low-rank phase retrieval problem, where the objective is to recover a sequence of signals (typically images) given the magnitude of linear measurements of those signals. Existing solutions involve recovering a matrix constructed by vectorizing and stacking each image. These solutions model this matrix to be low-rank and leverage the low-rank property to decrease the sample complexity required for accurate recovery. However, when the number of available measurements is more limited, these low-rank matrix models can often fail. We propose an algorithm called Tucker-Structured Phase Retrieval (TSPR) that models the sequence of images as a tensor rather than a matrix that we factorize using the Tucker decomposition. This factorization reduces the number of parameters that need to be estimated, allowing for a more accurate reconstruction. We demonstrate the effectiveness of our approach on real video datasets under several different measurement models.more » « less
- 
            Statistical machine learning algorithms often involve learning a linear relationship between dependent and independent variables. This relationship is modeled as a vector of numerical values, commonly referred to as weights or predictors. These weights allow us to make predictions, and the quality of these weights influence the accuracy of our predictions. However, when the dependent variable inherently possesses a more complex, multidimensional structure, it becomes increasingly difficult to model the relationship with a vector. In this paper, we address this issue by investigating machine learning classification algorithms with multidimensional (tensor) structure. By imposing tensor factorizations on the predictors, we can better model the relationship, as the predictors would take the form of the data in question. We empirically show that our approach works more efficiently than the traditional machine learning method when the data possesses both an exact and an approximate tensor structure. Additionally, we show that estimating predictors with these factorizations also allow us to solve for fewer parameters, making computation more feasible for multidimensional data.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available